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ABSTRACT 

The phenomenon of simultaneous differential item 
functioning (DIF) amplification and cancellation and the role of the 
SIBTEST computer program in detecting it were studied. A variety of 
simulated test data was generated for this purpose. In addition, the 
following real test data were used: (1) American College Testing 
program data for 2,115 males and 2,885 females in mathematics; (2) 
National Assessment of Educational Progress (NAEP) history test data 
for 1,225 males and 1,215 females; and (3) NAEP data for 1,711 whites 
and 447 blacks. The results from both simulated and real data, as the 
theory of R. Shealy and W. F. Stout suggests, show that SIBTEST is 
effective in detecting DIF amplification and cancellation (partially 
or fully) at the test score level. Finally, methodological and 
substantive implications of DIF amplification and cancellation are 
discussed. Ten tables present analysis results. (SLD) 



Reproductions supplied by EDRS are the best that can be made ''^ 
from the original document . 

?V A A ic A Vc ic A ic A ic Vc A ?V ?V ?V Vc -k A Vc ic •>< ic Vc >V >V ic t\ A i: i: A A A A A Ve Vc A ic >V A A A Vc A A A A A A it Vc ii A ic it -k A A A Vc A A Vc Vc A 



U.S. DC^AirrMiNT Of CDUCATtON 
Offtc* erf E(K«C«tior>«l n*«Mrch •Ad lmpr(W«m«nl 
EDliCATlONAL RESOURCES INFORMATION 
CENTER (ERIO 

a Thil documtrn his t)««n r^pfodwctd •» 
r*c«iv*d trom lh# p«r»on or OfO«niittion 

CpsD O Mif>Of chioo«i h«v« b««n mad« to improve 

reproduction qutlity 

• Po,nt»ofv>eworoptniooiitiledinthi»docir 
^ ment do rvol r>«ceM«rily repre«ni otficiii 

\<J^ i O^^' potitton Of poltcy 



CO! 



Simultaneous DIF Amplification and Cancellation: 
Shealy-Stout's Test for DIF 



Ratna Nandakumar^ 
Department of Educational Studies 
University of Delaware 



August 15, 1992 



Prepared for the Cognitive Science Research Program, Cognitive and Neural Sciences 
Division, Office of Naval Research, under grant number N00014-90-J-1940, 4421-548. Ap- 
proved for public release, distribution unlimited. Reproduction in whole or in part is 
permitted for any purpose of the United States Government. 



1 The author would like to convey special thanks to Wi"'am Stout for his insightful 
suggestions on this research and to Louis Roussos for programming assistance. 



0\ 2 



REPORT DOCUMENTATION PAGE 



Form Approved 
0MB NO. 0704-0188 




1. AGENCY USE ONLY (Leave blank) 



I 2. REPORT DATfc 

I IS August 1992 



3. REPORT TYPE AND DATES COVERED 

Technical : 1990-93 



4. TITLE AND SUBTITLE 

Simultaneous DIF Amplification and Cancellation; 
Shealy-Stout's Test for DIF 



6. AUTHOR(S) 

Ratna Nandakumar 



7 PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 

Department of Statistics 
Universi ty of 1 1 1 Inois 
725 South Wright Street 
Champaign, IL 61820 



9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 

Cognitive Sciences Program 
Office of Naval Research 
800 N. Quincy 
Arlington, VA 22217-5000 



5. FUNDING NUMBERS 

N00014-90-J-19^0, 



8. PERFORMING ORGANIZATION 
REPORT NUMB'^'? 

1992 - No, k 



10. SPONSORING/MONITORING 
AGENCY REPORT NUMBER 



i*l*2l-5A8 



11. SUPPLEMENTARY NOTES 

To be published in Journal of Educational Measurement 



12a. DISTRIBUTION /AVAILABILITY STATEMENT 

Approved for public release; distribution unlimited 



12b. DISTRIBUTION CODE 



13. ABSTRACT (Maximum 200 words) 

See reverse 



14. SUBJECT TERMS 

See reverse 



15. NUMBER OF PAGES 

33 



16. PRICE CODE 



17. SECURITY CLASSIFICATION 
OF REPORT 

unclassified 



18. SECURITY CLASSIFICATION 
OF THIS PAGE 

unclassified 



19. SECURITY CLASSIFICATION 
OF ABSTRACT 

unclassified 



20. LIMITATION OF ABSTRACT 



UL 



NSN 75^0-01-280-5500 



Standard Form 298 (Rev 2-89) 
Pfrscfibeo Dv ^39* '8 

2ye-i02 



Simultaneous DIF Amplification and Cancellation: Shealy-Stont's Test for DIF 

Abstract 

The present study investigates the phenomena of simultaneous DIF amplification 
and cancellation and SIBTEST-s role in detecting such. A variety of simulated test data 
were generated for this purpose. In addition, real test data from various sources were used. 
The results from both simulated as well as real test data, as Shealy and Stout's theory 
suggests, show that the SIBTEST is effective in assessing the DIF amplification and 
cancellation (partially or fully) at the test score level. Finally, methodological and 
substantive implications of DIF amplification and cance! *on are discussed. 

Subject terms: SIBTEST, DIF, item bias, test bias, bias amplification, bias canceUation 
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Simnltaneotis DIF Amplification and Cancellation: Shealy-Stont's Test for DIF 

Studies of bias have been widely prevalent in educational measurement since the 
1960s. Early attempts to study bias in tests were largely based on the notion of predictive 
validity. Consequently, a number of regression models were developed, based on different 
definitions of fairness, in order to achieve fair employment selection and college admissions 
(Peterson and Novick, 1976). Since the advent of item response theory (IRT), however, 
study of bias and differential item functioning (DIF) at the item leveJ has gained much 
popularity. Several methodologies have been developed by various researchers to study 
item bias and DIF (for descriptions and/or comparisons of different procedures, see for 
example, Angoff, 1982; Cleary & Hilton, 1968; Dorans & Kuiick, 1983, 1986; Hambleton & 
Rogers, 1989; Holland & Thayer, 1988; Hunter, 1975; Ironson, 1982; Lord, 1980; Raju, 
1988; Reynolds, 1982; Scheuneman, 1979; Shealy & Stout, 1992b; Shepard, Camilli, & 
AveriU, 1981; Swaminathan & Rogers, 1990; Wainer, Sired, & Thissen, 1991). 

These procedures can usually be used in an effort to detect either item bias or DIF. 
The subtle distinction between the closely related concepts of bias and DIF can be 
explained as follows. In the conceptualization of "item bias", it is generally assumed that 
the validity of some items of the test could be questionable while the rest of the items are 
considered valid. That is, these items of questionable validity could contribute to test score 
differences between groups of examinees with equal ability. In DIF analyses, however, it is 
conceptur ized that some items could contribute to test score differences between two 
groups of examinees matched according to some criterion about which no validity claim is 
made. For example, examinees could be matched upon total test score with no 
accompanying claim of validity for the items of the test. Therefore, in item bias analyses, 
the construct validity of the matching subtest needs to be established while in DIF analyses 
it is not needed. In this sense item bias is a special case of DIF. Several biased items acting 
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in concert produce test bias, and several DIF items acting in concert produce DTF 
(differential test functioning). Shealy and Stout (1992b) have further discussed the 
differences between bias and DIF analyses in a more detailed manner. 

One of the recently developed IRT based methodologies for detecting item/test bias 
or DIF/DTF has been developed by Shealy and Stout (1992a,1992b). Known as SIBTEST 
(SIB denotes simiiltaneous item bias), it is a statistical test to simultaneously detect bias 
present in one or more items of a test. SIBTEST is an outgrowth of the multidimensional 
IRT modeling of test bias as presented in Shealy and Stout (1992a), and it is the first 
among IRT based procedures to allow the simultaneous testing for bias present in more 
than one item. The phenomenon of simultaneous item bias is said to occur when several 
biased items acting in concert affect the test score differentially for the different examinee 
subpopulations, resulting in test bias. In part, because of its multidimensional modeling 
approach, SIBTEST has several distinct features. First, single item bias as well as 
simultaneous item bias can be detected. Second, a formal distinction can be made between 
genuine test bias and impact, which is due to ability differences between groups in the 
ability intended to be measured (Ackerman, 1991a, Dorans, 1989). Third, the underlying 
psychological (cognitive) mechanisms that produce bias can be explicitly addressed through 
consideration of the target ability as contrasted with nuisance determinants. The target 
ability 9 is the ability intended to be measured by the test, the nuisance determinant (s) r/ 
is an ability or construct not intended to be measured by the test but influencing the 
responses to one or more items. 

One of the major advantages of considering simultaneous item bias is that it is 
possible to study item bias amplification and item bias cancellation. Bias amplification is 
illustrated by the following: if a set of individual items is each biased against males, then 
one can study the effect of the bias collectively against males at the overall test score level. 
Bias cancellation is illustrated by the following: if one set of individual items is each biased 
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against males and another set of items is each biased against females, then it is possible 
that at the overall test score level the respective biases might cancel each other out. In any 
bias study one should investigate both of these possibilities. The phenomenon of item bias 
cancellation has been previously studied empirically by Drasgow (1987), Roznowski (1987), 
and Reith and Roznowski (1991). 

Reith and Roznowski (1991) and Roznowski (1987) have studied the effect of biased 
items on the predictive validity of the test. They concluded that inclusion of biased items 
in the test can actually contribute to increased predictive validity when the sources of bias 
are diverse and multiply determined. They argue that, although item? with non-trait (but 
trait-relevant) variance may manifest bias at the item level, nonetheless, several such 
items can actually improve the amount of variance explained by the trait at the test score 
level (here "trait" refers to the ability of interest). This is because, at the test score level, 
the amount of non-trait variance diminishes while the trait variance increases, thus 
improving the predictive validity. Thus, the removal of biased items might sometimes be 
considered to be detrimental to the predictive validity of the test. 

Drasgow (1987) has shown, using Lord's chi— square item bias statistic, that several 
biased items of ACT mathematics usage and English usage tests, biased in different 
directions (some against Whites, some against Blacks, some against Hispanics, etc.), had 
no cumulative bias effect on the expected number-correct score. That is, there were no 
consistent differences in the test scores across groups. This was attributed to bias 
cancellation across groups. Humphreys (1970, 1986) xias long recommended deliberate 
inclusion of diverse non-trait determinants in test items in order to diminish the biasing 
influence of any particular non-trait ability at the test score level. These studies clearly 
show that the study of the effect of amplification or cancellation of biased or DIF items at 
the test score level is a significant problem. Shealy and Stout (1992a) directly address these 
issues by nodding bias in a multidimensional frame work and considering the simultaneous 
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influence of several biased items at once. According to them, the presence of 
multidimensionality is a prerequisite for bias. If test data can be modeled by a 
unidimensional or an essentially unidimensional (Stout, 1990) model, then bias cannot 
exist. The concept of bias m a multidimensional frame work has also been emphasized by 
Shepard (1982), Kok (1988) and others. As noted before, the SIBTEST procedure is an 
outgrowth of the multidimensional modeling of bias. 

Shealy and Stout (1992a, 1992b) have demonstrated through simulation studies the 
ability of SIBTEST to detect unidirectional bias; that is, bias against the same group 
regardless of the level of target ability 6. In their simulations, they used two- and 
three— parameter logistic models with varying sample sizes and differing degrees of induced 
bias. The findings showed that SIBTEST displayed good adherence to the nonainal level of 
significance in cases of no bias and good power in cases where one or more items were 
biased, even when the amount of bias was fairly small. In cases of single item bias studies, 
the performance of SIBTEST was compared to that of the Mantd-Haenszel statistic. Both 
the SIBTEST and the Mantel-Haenszel procedures produced consistent results with 
respect to the direction and the amount of estimated bias. 

The purpose of this paper is to define the concepts of DIF amplification and DIF 
cancellation and to investigate the power of SIBTEST to address these phenomena. A 
series of real data and simulation data are used for this purpose. In case of single item 
analyses, SIBTEST results are compared with the Mantel-Eaenszel results. Also, a brief 
description of the SIBTEST procedure is provided. 

Description of SIBTEST Procedure 

In this section, for ease of presentation, we will assume the bias viewpoint rather 
than the DIF/DTF viewpoint. It is vital, however, to realize that a similar presentation 
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could have been given using the DIF/DTF perspective. As discussed before, the 
interpretations of SIBTEST results have either a test bias or a DTP interpretation, 
depending upon the level of user assumptions about the validity of the matching subtest 
items. In particular, SIBTEST can be used as a DIF procedure if desired. 

Two groups (or subpopulations) of interest, the reference group (J?) and the focal 
group (i^j sire assumed to take a given test. The complete latent space 9 underlying the 
test items is assumed to be multidimensional: = where ^is the target ability, 

intended to be measured by the test, and u is the nuisance ability vector (possibly 
multidimensional), not intended to be measured by test items. For example, in an English 
vocabulary test, it is possible that some items are male oriented, such as those requiring 
knowledge of sports, and some other items are female oriented, such as those requiring 
knowledge of domestics. In a situation like this, English vocabulary skill is the intended to 
be measured ability {6). Knowledge of sports (rj^) and knowledge of domestics (t/j) are 
nuisance abilities. Let II denote the test response vector and h(II) the test scoring method. 
Number correct is used as the scoring method throughout this paper. It is assumed that all 
items of the given test measure the target ability 6, and some items (biased items) measure 
both target ability and one or more nuisance abilities n. It is also assumed that the usual 
IRT assumptions of local independence, monotonicity, and group invariance hold with 
respect to i and that this collection of assumptions do not hold for any subset of 
components of 9. 

The statistical procedure for testing the null hypothesis of no test bias is briefly 
explained below, for details see Shealy and Stout {1992b). The hypothesis can be stated as: 

^^:/?j^=0 vs. H:I3^>Q, 



where PrAs a parameter denoting the amount of unidirectional test bias against the focal 
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group. Unidirectional bias occurs i£ the probability of answering an item(s) is consistently 
higher (lower) for one group compared to the other, over all levels of ability 9. That is, 
marginal item characteristic curves^ for the two groups do not cross as 6 varies over the 
ability range. Let X = SjJ CA be the total score ^n the valid subtest, which by definition, 
consists of n items the user is willing to assume measure the target abilitj Let V - 
. ^ (7. be the total score on the studied subtest which consists of one or more items 

71+1 t 

measuring target and possibly nuisance abilities. It is assumed that, for long tests, 
examinees with the same valid subtest score are of approximately equal target ability 9 and 
thus are comparable. Following this logic, examinees within reference and focal groups are 
subgrouped according to their total score on the valid subtest. Examinees with the same 
valid subtest score are then compared across reference and focal groups on their 
performance on the studied subtest item(s). The test statistic, which is a sort of 
standardization index (see Dorans & Kulick, 1986), for testing the null hypothesis of no 
bias is then given by 

B=A-, (i; 

where 0^ ^ll^RlT ~^Fk>' ^jfc^^ proportion among focal group examinees 
0 

attaining X-k on the valid subtest. V'^j^jand ~Yp^ are the "adjusted" means of the studied 
subtest for examinees with a valid subtest score of X=k (fc=0,l,...,n) in the reference and 
focal groups respectively. Because the procedure must work for short as well as long tests, 
these means are adjusted for differences in the Q distributions between reference and focal 
groups arising from short test lengths (for example, 25 items), and inherent differences in 
the Q distributions for the two groups (for details, see regression correction in Shealy k 
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Stout, 19921)). cr(/3 J is the estimated standard error of /J^given by 




where Y\ k,g) is the sample variance of the studied subtest for examinees in group g {R 
or F) with a total score of k on the valid subtest; and and J are the sample sizes in 
the reference and focal groups respectively with a total score of k on the valid subtest. 

The null hypothesis of no bias is rejected with error rate a if the value of B exceeds 
the upper 100(l-a)th percentile point of the standard normal distribution. /3j^is also the 
statistic used V ^. ^ :'riate the amount of unidirectional bias For example, a value of 
0.1 indicates that the average difference in the expected total test scores between reference 
and focal group examinees of similar ability is 0.1. If this is the result of a single studied 
item with the reminder of the items assumed valid, then /3jy = 0.1 is the estimated 
difference in the probability of getting the studied item correct between reference and focal 

A 

group examinees of similar ability. Positive values of indicate bias against the focal 

A 

group and negative values of /J^ indicate bias against the reference group. Simulation 
studies by Shealy and Stout (1992b) showed that B has good statistical properties such as 
good adherence to the nominal significance level and high po. er. 

Simulation Study 
Details about Simulations 

In order to investigate amplification and cancellation of DIF and the use of 
SIBTEST to detect such, a simulation study was designed to model realistic situations. 
Item parameters (a.,6.,c.) of valid subtests were obtained from the literature and the item 
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parameters of studied subtests were hand selected to control the amount of DIF present. 
The estimated item parameters from the SAT-Verbal (Drasgow, 1987) were used for valid 
subtests. The parameters of the studied subtest items (that is, DIF items) are listed in 
Table L Item parameters of studied subtests were selected such that the difficulty 
parameters were all centered around zero, with varying discrimination parameters for 5, ri^ 
and 7)2^ All studied subtest items, except the last three, are influenced by 6 and Tiy The 
last three items are influenced by 9 and T/g. The guessing level is fixed to 0.2 for all items. 
For amplification studies, only items with nuisance ability tj^ were used. For the 
amplification and cancellation study, both, items with nuisance ability and items with 
nuisance ability were used. 
Amplification Study 

The target and the nuisance abilities were generated from a bivariate normal 
distribution as follows. For notational simplicity the subscript for 7]^ is dropped. 



where p is the correlation between 9 and 7] for group ^, which is set at 0.5 for both groups 
(different values of p across groups tends to produce bidirectional DIF). As can be seen the 



det' .mined through specification of other parameters as follows. 

Tar get ability difference between the reference and focal groups is denoted by 




(2) 



variances cr\0\g) and a\T]\ g) were set at 1. The means and /i^^ for each group were 




''OP ' 



(3) 



where 
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and and /^denote sample sizes in reference and focal groups respectively, or q pis the 
weighted average of the variances of reference and focal groups on the target ability. Since 
a\Q i R) and a\Q \ F) were taken as 1 in simulation studies (see Equation 2), d j, = 
^dR~^9F "^^^^ '^T ^ measure of how much the two groups differ in target ability 
distributions (same as impact). 

Another criterion for choosing /x^^ and /i^j^iwas that the average difficulty level (^ 
of the valid subtest items was assumed equal to the average target ability pooled across 
groups: 



That is, on average the difficulty of the valid subtest items is assumed to be well matched 
with the pooled average target ability of the two groups. By specifying dj^and I, Equations 
3 and 4 together determine /i^^ and figp Parameters /i^^^ and /i^^ were determined as 
follows. 

Potential for DIF C p is defined as the difference between the conditional 
expectation of Tj for the two groups, given by 



Ci^=E{Vr\0]-E[vp\0] 

Following Equation 2 and 3 
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Another criterion for choosing the means of rj is that, for an "average" value of 
target ability (0=0) we assume the conditional nuisance ability to be centered around the 
chosen target ability value for the two groups. Namely, 

E[;7^10=O]=-E[7?^!0=O] 

That is, 

Once /i^^ and fiQp are known, by specifying /i^^ and (i^p can be determined 
from Equations 5 and 6. 

The choice of values for in the simulations were guided by the desired amount of 
the estimated DIF, P^^ln other words, values of Cp were chosen so that the amount of 
estimated DIF would be "small" (0<^j^0.05), "moderate" (0.05<^y<0.1), or "large" 
(p^O.l). From the practical viewpoint, the standard used to determine what is meant by 
small, moderate, or large DIF was based o.. observed delta values of the Mantel-Haenszel 
statistic (Holland & Thayer, 1988). An approximate empirical relationship between 
Aj^^and ^^is given by 

0u~-^mh''o 

Recall that /3rr is a measure of the average difference in expected test scores between 
reference and focal group members of similar ability. That is, as estimated by can 
be useful for direct interpretations of DIF in terms of differing expectations of total score 
for the two groups. 

In simulation studies presented here dj, was taken as zero. That is, the difference 
between the target ability means in the two groups was zero . For simulation studies where 
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drp^O, see Shealy & Stout (1992b). Two values of were considered: 0.5, and 1.0. Positive 
values of Cp denote DIF against the focal group and negative values of Cp denote DIF 
against the reference group. Three different combinations of examinee sizes (J pJji)> 
typical of those commonly occurring in applications, were considered: (7^500, J^500), 
(1000, 3000), and (lOOO, 1000). Two valid subtest lengths (iV) were considered: 25 and 50 
items. These items were randomly selected from 80 estimated three-parameter logistic 
item parameters. Item responses for the valid subtest were generated by using the 
three-parameter logistic model: 



(B) 



P (9) = c. -h i=l,...,n 

» l+exp(-:.7(a.(9-b^) 



where a -, 6 -, and c- are the discrimination, difficulty and guessing parameters of item i. 
t t I 

Item responses for the studied subtest were generated by using the two-dimensional three 
parameter logistic model with compensatory abilities (Reckase & McKinley, 1983): 



^ ,i=n+l,...,N (9) 



Fpr each simulated examinee (see Equation 2), binary item responses (0,1) were 
obtained as follows. The probability of correctly answering valid subtest items was 
computed using Equation 8. If a simulated uniform random value on the interval (0,1) was 
less than or equal to the computed P^d), then the item was considered answered correctly 
and a score of 1 was assigned. Otherwise the item was considered incorrect and a score of 0 
was assigned. Similarly, for studied items P^O^r]) was computed using Equation 9 and a 
score value of 0 or 1 was assigned. 



15 



Amplification and Cancellation of DIF 13 



Cancellation Study 



Since there are two nuisance abilities rj^ and r/j in this case, these are generated as 
follows. The 9 and 7]^ have a bivariate normal distribution given by 



and 6 and T/g have a bivariate normal distribution given by 



|e|.| 

[rj2\9) 



-It;)- a 



(11) 



where p is the correlation between 9 and rj^, and between 9 and r/g, which is taken to be 0.5 
for both groups. Also, t]^ and r/j were generated independently of each other, for each fixed 
6. As in the case of amplification, variances (T\9\g), a^(r/j^|g) and o^Cr/gU) were all taken 
to be 1. Thp means (/x^j^ and /i^p) were determined by Equations 3 and 4. The means 
^r] g (^r? R ^Tj ^7/ g ^^rj R '^rj determined through Equations 12 

and 13 as follows: 

-(\R-\f'>-'"'p i=1.2 (12) 

and 
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where C^^is the potential for DIF caused by the nuisance ability rj- and is chosen just as 
for the amplification case. Item responses were generated just as in the amplification case 
using Equations 8 and 9. Here Equation 9 applies to {9,ri^) or (^,7/2) depending upon item 
number. For example, items 1 through 11 of Table 1 depend upon 6 and rj^, and items 12 
through 14 depend upon 6 and T/g- 

Results of Simulation Study 

Three different simulation studies were done, each with varying values for {-^jiiJp}, 
Cp, and N. The results for Amplification Study 1 are shown in Table 2. This study has 500 
examinees in each of the focal and reference groups with 50 items in the valid subtest. The 
first column denotes the item numbers (taken from Table 1) used in the studied subtest; 
the second column denotes the degree of potential for DIF induced in the simulations (C^); 

the third column denotes the average estimated DIF over 100 replications (P^)', the fourth 
column denotes the observed (estimated) standard error of over 100 replications; and 
the fifth column denotes the rejection rate of testing the null hypothesis of no DIF over 100 
replications. The last three columns report the estimated mean, standard error, and the 
rejection rate of DIF using the Mantel-Haenszel statistic over 100 replications. The first 
row of Table 2, for example, denotes that item 4, from Table 1, was used in the studied 
subtest with .50 as the potential for DIF. The average amount of estimated DIF, over 100 
repUcations, was .022 with a standard error of .036. The null hypothesis of no DIF was 
rejected 18 out of 100 replications. The Mantel-Haenszel analyses indicate that for this 
item, the estimated mean of A^^was -.342 with an observed standard error of .435. The 
null hypothesis of no DIF was rejected 9 times out of 100 repUcations. 

As can be seen from Table 2, each of the items 4, 5, 6, 7, and 8 were tested 
individually for DIF, and then tested collectively. That is, in each case the valid subtest 
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consisted of 50 items and the studied subtest consisted of exactly one item except for the 
last row where the studied subtest consisted of all five items. It can be seen that the 
average amount of estimated DIF for individual items ranged firom .022 to .035, indicating 

A 

small DIF (0</3y<.05) at the item level. When ail five DIF items were included in the 
studied subtest, however, the amount of estimated DIF was amplified to .148, indicating a 
large DIF {Pjjhl)^ In other words, when all DIF items act in concert, the difference in the 
expected test scores between the groups was about ,15. Thus, from column three, it can be 
seen that at the item level each of these items are likely to be missed as DIF items because 
of their low value of estim*.' d DIF, nonetheless, at the test level the amplification is such 
that the total DIF is substantial. Similarly firom column five it can be seen that the 
rejection rate for individual items ranged fi:om .17 to .23 while the rejection rate for all five 
items together jumped to .7, reflecting the cumulative effect of DIF. Comparison of 
SIBTEST results with those of Mantel-Haenszd show that both the procedures are 
consistent in their assessment of direction of DIF, the amount of estimated DIF, and the 
standard error of estimate, whenever a single item was considered. 

Table 3 displays the results of Amplification Study 2. In this case the degree of 
potential for DIF was increased to 1.0 and the sample sizes for reference and focal groups 
were increased to 3000 and 1000 respectively. Items 9, 10, and 11 (from Table 1) were 
selected for this study. Similar to the results in Table 2, for individual DIF items, the 
amount of estimated DIF was axoderate (.05<^^<.l). However, when all three DIF items 
were included in the studied subtest, the amount of estimated DIF was amplified to .225, 
indicating large DIF. That is, when all three DIF items act in concert, the estimated 
difference in the expected test score between the groups was beyond 0.2. Comparison of 
results of SIBTEST with those of Mantel-Haenszel again showed that they are consistent 
and comparable whenever a single item was considered for DIF. 

Table 4 displays the results of the Amplification and Cancellation Study. Each of 
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the reference and focal groups contains 1000 examinees with 25 items in the valid subtest. 
Items 1, 2, and 3, which depend upon 9 and rj^ were used here with 0.5 as the potential for 
DIF against the focal group (C^ positive). These studied items were tested individually 
and collectively for DIF against the focal group. Items 12, 13, and 14, which depend upon 9 
and were used with -0.5 as the potential for DIF, but against the reference group (C^ 
negative). These items were also studied individually and collectively for DIF against the 
reference group. Finally, all six items were used collectively with their corresponding 
positive and negative DIFs to study DIF cancellation. As can be seen from Table 4, items 

1, 2, and 3 together exhibit large positive DIF against the focal group ()9j^.l88); while 

items 12, 13, and 14 exhibit large negative DIF against the reference group (/3j^-.185); 
However, when items 1, 2, 3, 12, 13,' and 14, were coml \ned together in the studied subtest, 

the DIF canceled out at the test score level (/3j^-.002). Thus, this test, in spite of having 
six DIF items, displays virtually no DIF at the test level. Note that SIBTEST was used 
both to detect the amplification of positive DIF for items 1, 2, and 3 and the amplification 
of negative DIF for items 12, 13, and 14, as well as the cancellation resulting from the 
combined influence of all six studied items. 

In summary, the simulation studies have demonstrated the effectiveness of 
SIBTEST in detecting DIF amplification and DIF cancellation. This was established for 
different sample sizes and test lengths. Comparison of SIBTEST results with those of 
Mantel-Haenszel, at the item level, show that both are performing about equally well. 

Real Data Study 

Description of the Data 

Three real data sets were used to investigate the effectiveness of SIBTEST to detect 
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amplification and cancellation of DIF in a real application. The data sets considered were: 
the American College Testing program (ACT) mathematics test data, Form 39B, for males 
and females; The National Assessment of Educational Progress (NAEP), 1986 history test 
data for males and females, and for Blacks and Whites (NAEP, 1988). The mathematics 
data consists of 60 items with 2115 males and 2885 females. The history data consists of 36 
items with 1225 males, 1215 females, 1711 Whites, and 447 Blacks. The analyses were 
carried out in the following manner. 

For each of the data sets, DIF/DTF analyses were performed. That is, each item 
was analyzed for DIF with the rest of the items forming the "valid subtest". In the first 
stage of item level analyses, both SIBTEST and Mantel-Haenszel statistics were computed 
and compared for each item. In the second stage of test level analyses, items that exhibited 
moderate to large DIF according to both procedures were analyzed together to investigate 
DIF amplification and cancellation. For these analyses, each studied subtest consisted of a 
collection oi items of one of thiee types: items favoring the focal group, or items favoring 
the reference group, or item favoring both groups (that is, some items favoring the 
reference group and other items favoring the focal group). Thus an attempt was made to 
study both amplification and cancellation, firom the DTF perspective. 

Results of Real Data Study 

The results of the analyses of mathematics data for males and females are shown in 
Tables 5 and 6. Table 5 shows the results of individual item analyses (that is DIF 
analyses). The items listed were identified as exhibiting DIF by both the procedures, the 
SIBTEST and the Mantel-Haenszel^. The first half of Table 5 shows items exhibiting 
moderate {.05<Pjj<.l) to large (P-^l-l) amount of DIF favoring males. That is, these items 
are showing DIF against females. The second half of Table 5 shows items exhibiting 
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moderate to large amount of DIF against males. 

Table 6 shows DIF amplification and cancellation effects for items shown in Table 5. 
Table 6 shows items used in the studied subtest; whether studied items favor irales or 
females; the amount of estimated DIF (/3^); the value of the Shealy-Stout statistic (B of 
Equation 1) and the associated p~value. The first row of Table 6 shows DIF cancellation 
effect of items 17 and 19 together. Item 17 favors males with large DIF while item 19 
favors females with large DIF, each at the item level. When these items were combined 
together, however, the DIF canceled out completely at the test level {P^-.QQQ6). That is, 
although each of the items is favoring a different group at the item level, together at the 
test level the DIF canceled out resulting in no difference in the expected test scores of the 
two groups. The second row of Table 6 shows DIF amplification of items showing moderate 
DIF, each against females at the item level. The third row shows DIF amplification of 
items showing moderate DIF, each against males at the item level. The last row shows DIF 
amplillcation and cancellation when all items favoring males (with moderate and large 
DIF) and all items favoring females are analyzed together. Because DIF amplification for 
items favoring only males is higher in magnitude than DIF amplification for items favoring 
only females, when all DIF items were combined, positive and negative DIF is not totally 
canceled out. That is, there is some overall DTF for these items against females 
.294). 

Tables 7 and 8 show the results of the analyses of the history test for males and 
females. Analogous to Table 5, Table 7 shows items exhibiting moderate to large amounts 
of DIF, by both procedures, for both groups. Table 8 shows the results of DIF amplification 
and canceUation effects. In Table 7 there is only one item with large DIF favoring males. 
The rest of items exhibit moderate DIF. Therefore, Table 8 shows DIF amplification 
results for items favoring males only; amplification results for items favoring females only; 
and amplification and cancellation results for all DIF items. As can be seen from the last 
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A 

row of Table 8, there is almost total cancellation of DIF (j3 ^^.018) when all DIF items 
were assessed together. Thus, there is no DTF present in this case. 

Tables 9 and 10 show the results of the analyses of the history test for Whites and 
Blacks. Analogous to the above two cases, Table 9 shows DIF results at the item level and 
Table 10 shows DTF results at the test level. It can be seen from Table 9 that very few 
itemis favor Blacks relative to the number of items that favor Whites. Therefore Table 10 
only contains amplification results for items favoring Whites only and amplification and 
cancellation results for all the DIF items from Table 9. As expected, in this case, the 
magnitude of DIF amplification against Blacks is large, and when all DIF items were 
combined together there is only moderate DIF cancellation with overall DTF remaining 
against Blacks. 

In summary, findings of real data studies have replicated findings from simulated 
studies in the sense that both amplification and cancellation were established. The results 
of SIBTEST analyses at the item level were almost totally consi^itent with those of the 
Mantel-Haenszel both in the direction and the amount of estimated DIF. The 
amplification and cancellation results using SIBTEST with real data have demonstrated 
the capability of SIBTEST to address these issues in real settings. It should be emphasized 
that the real data studies were DIF/DTF and not bias studies. These results are 
encouraging for future applications of SIBTEST for studying the cumulative effects of DIF 
at the test score level. 

For all three sets of real data, content analyses of DIF items were performed in an 
attempt to identify the possible correlates to the occurrence of DIF and DTF. Upon 
studying the mathematics items shown in Tables 5 and 6, it was found that items that 
favored males and displayed amplification required analytical/ geometry knowledge, such 
as, properties of triangles and trapezoids, angles in a circle, volume of a box, etc.; whereas 
items that favored females and displayed amplification required computational knowledge 
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such as factorization, solving equations, etc. Based on these informal content analyses of 
the two sets of items displaying amplification, one could cautiously conjecture that math 
education of males may tend to develop understanding of analytical concepts while math 
education of females may tend to develop computational skills. Similar conclusions were 
drawn by Drasgow (1987) about the content of biased items of a different version of the 
ACT mathematics test. 

Similarly, the analyses of the history iteras for the male, female comparison revealed 
that items favoring males involved factual knowledge, such as location of different 
countries on the world map, dates of certain historical events, etc., whereas, items favoring 
females involved reasoning ability about the constitution, entrance to the League of 
Nations, etc. 

Content analyses of history items for Blacks and Whites again revealed factual 
knowledge items favoring Whites. That is, these items required knowledge of the location 
of different countries on the worid map, facts about Worid War II, etc. There were only 
three items that favored Blacks and a common secondary trait in these three items was not 
evident. It was also interesting to note that, across the three data sets, the difficulty level 
of items that exhibited DIF did not differ significantly from the difficulty level of the rest 
of the items in the respective tests. In other words DIF was not related to difficulty level of 
items. 

Summary and Discussion 

This paper has investigated DIF amplification and cancellation at the test score 
level and SIBTEST's ability to detect and estimate each. Based on simulation as well as 
real data analyses, SIBTEST demonstrated its effectiveness to assess DIF at the item level 
as well as at the test score level. As demonstrated, at the test score level the cumulative 

?3 



Amplification and Cancellation of DIF 21 



effect of DIF could either amplify or cancel out partially or completely. In addition, at the 
item level of analysis, comparison of SIBTEST with Mantel-Haenszel showed mutual 
consistency. 

If one wants to detect bias rather than merely detect DIF or DTF, one of the 
requirements of SIBTEST is that it requires a valid subtest, which serves as an internally 
valid benchmark to assess bias against. On the face of it, this requirement may sound 
unrealistic. However, attempts by Ackerman (1991a, 1991b) and others seem promising in 
obtaining an empirically validated valid subtest that could greatly assist in bias analyses. 
As an alternative to using the "valid" subtest to match examinees, one could also use an 
external criterion of the intended to be meas^ored ability in concert with or instead of the 
valid subtest. 

Study of DIF at the item level as well as at the test level can be very useful for test 
construction purposes. It is well known that item responses are multiply determined in the 
sense that multiple traits determine an examinee's response to each item. The decision to 
remove/add items should not be based at the item level analyses alone but should consider 
the effect of such items at the test level, it is possible one could add/remove items in order 
to balance the influence of one or more of secondary traits. Moreover, since decisions about 
individuals are made at the test score level, it is important to simultaneously assess the 
cumulative effect of several DIF items affecting different subpopulations at the test score 
level. As emphasized by other researchers (Drasgow, 1987; Humphreys, 1986; Roznowski, 
19897; Reith & Roznowski, 1991), inclusion of items with multiple determinants could 
significantly improve the predictive as well as the construct validity of a test. Based on the 
analyses presented herein, SIBTEST could greatly aid in this process. 

Although a statistical hypothesis testing procedure can be useful in the detection of 
test bias or DTF, it is important to distinguish between statistically significant DTF from 
a practically significant amount of DTF. This is because with any statistical procedure, it 
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is well known that with large sample sizes small differences in group performance can result 
in a statistically significant result. For example, Drasgow (1987) has shown, through Lord's 
chi-square's method, that a large significant chi-square statistic may only reflect moderate 
bias at the test score level, even when one third of the items are biased. In the present 
study, for example, it would be useful to know the practical significance of observing a P^j 
value of .1, .5, 1.0 etc. at the test score level. The estimated index of DIF, ySy, should be 
useful in assessing whether the amount of DIF present is of practical importance. 

SIBTEST although derived using IRT, uses simple means and variances of scores on 
valid and studied subtests to obtain test statistics. It is computationally simple and does 
not involve IRT parameter estimation, thereby avoiding estimation problems. Simulation 
and real data studies of this paper have demonstrated SIBTEST's potential for assessing 
amplification and cancellation of DIF in a variety of situations. Nonetheless, more studies 
with varied sample sizes, test sizes, and in diverse contexts would be useful to further 
establish its empirical utility. Menu driven code and a user's manual are available on 
request for interested users. 
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Notes 



■"•If ?{6) denotes the item characteristic curve then the marginal T?{6) is gotten by 
integrating out u from P(^,j2) using the conditional density f(j3|<?). P(^) is interpreted as the 
probability of a randomly chosen examinee with target ability 9 getting the item right. 

^For some applications, it can make more sense to use reference group examinees or the 
entire group of examinees. 

^Generally one finds nonzero differences in group means on the taiget ability (that is, 
dq^^O).. However, there are many realistic situations where no differences in group means 
existe In the present study drj> was taken as zero mainly to keep the design simple. The 
effectiveness of SIBTEST to detect DIF for varying drj, values has been demonstrated by 
Shealy and Stout (1992b) and by Roussos (1992). In these studies drj, was used as a factor 
in the experimental design. 

"^Across the three data sets (total 132 items), there were seven items where there was 
inconsistency between the SIBTEST and the Mantel-Haenszel analyses. Three items 
exhibited DIF through SIBTEST only and four items exhibited DIF through 
Mantel-Haenszel only. These items were not included in the studied subtest. 
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Table 1 



Item Parameters of Studied Subtests 
for Simulation Studies 
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Table 2 



Amplification Study 1 
7^500, J^=500, N = 50, (ijv=0, cc=.05 



SIBTEST Mantel-Haenszel 



Item 




IT 




Rejection 
rate 
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MH 




Rejection 
rate 
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.50 


.022 


.036 


.18 


-.342 


.435 


.09 
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.031 


.031 


.17 
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.398 


.15 
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.035 
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.23 
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.423 


.22 
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.039 
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.445 
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4,5,6, 
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.70 
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Table 3 

Amplification Study 2 
J^IOOO, J^=3000, N = 50, d^O, a=.05 



SIB Mantel-Haenszel 



^ o cipfo \ Rejection I cpd, \ Rejection 
Item SE{P^) ^^^^^ \^ SE{\^) ^^^^ 



9 1.0 .062 .015 .99 -.996 .223 1.00 

10 1.0 .087 .019 1.00 -1.140 .272 1.00 

11 1.0 .096 .019 1.00 -1.248 .256 1.00 

9,10,11 1.0 .225 .028 1.00 - - 



Table 4 



Amplification and Cancellation Study 
Jjf^lOOO, y„=1000, ^■= 25, dn^O, oc=.05 
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rate 
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12 




-0.5 
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1.00 
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.022 


.82 
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.021 


.98 


12,13,14 
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.036 


1.00 


1,2,3 
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• .061 


.02 


12,13,14 













Table 5 



Results of Mathematics Test Males vs Females 



Item Level DIE Analyses: SIBTEST k Mantel-Haenszet 



Items favoring males 


/ferns favoring females 


.05 <\< 'I \> .1 


A < -i^ < .05 -^y < .1 


23, 32, 34, 38 17 


4, 5, 9, 14, 29 19 


48, 52, 58 





^ These items were identified as exhibiting DIP by both the SIBTEST 
and the Mantet-Haenszel 



Table 6 

Results of Mathematics Test: Males vs Females 
DTP Amplification and Cancellation: SIBTEST 

items of the favors favors 0^ B p 

studied subtest males females 

17 t 19 - - -.0006 -.06 .524 

23, 32, 34, 38, 48 yes - 0.523 12.85 .000 

52, 58 

4, 5, 9, 14, 29 - yes -.340 -10.15 .000 

22, 32, 34, 38, 48 yes - 0.294 4.68 .000 

52, 58, 17, 4, 5, 9 
14, 29, 19 



33 





Table 7 


Resrdts of History Test Males vs Females 
Item Level DIP Analyses: SIBTEST k Manteh-Haenszel 


Items favoring males 




Items favoring females 


.05 <0^<--i- \> -1 




.1 < -0^ < .05 < .1 


12, 15, 25, 30 1 




9, 11, 22, 24, 34 



Table 8 

Resvlts of History Test Males vs Females 
DIF Amplification and Cancellation: SIB 

items of the favors favors 13^ B p 

studied subtest males females 

12, 15, 25, 30, 1 yes - 0.437 9.02 .000 

9, 11, 22, 24, 34 - yes -.381 -7.87 .000 

12, 15, 25, 30, 1, 9, - - 0.018 0.24 .405 

11, 22, 24, 34 
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Table 9 



Results of History Test: Whites vs Blacks 
Item Level DIP Analyses: SIBTEST k MantehHaenszel 



Items favoring Whites 


Items favoring Blacks 


.05 <)^<A .1 


.1 < -0^ < .05 -i^ < .1 


7, 11, 12, 16, 13, 14, 15 


3, 4, 5 


35 17, 32, 36 





Table 10 

Results of History Test: Whites 
Item Level DIP Analyses: 


vs Blacks 
SIB 






items of the favors 
studied subtest Whites 


favors 
Blacks 


K 


B 


P 


all items favoring Whites yes 
only 

all items favoring Whites yes 
and Blacks 




1.310 
1.150 


9.96 
7.43 


.000 

.000 
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Director of Life aftd 

EnN-ironmenul Science* 
AFOSRyNU NU Bldj, 410 
Bollmi AFB. DC 20352.^ 

Dr. Tboma* G. B<vef 
Depantnent of Pfycbotefif 
Unrtcniiy of Roct>e»ier 
Rfvcr SuikHi 
Rocftcuer. NY M627 

Dr. Mcnucha Birenbaum 
GJucjiiorul Tcutng 

Service 
|*nrHCioa NJ 06S41 

Dr. Druce Dloioa 
Dc-fcn»e Manpower Dad Center 
W Pacific St 
Sgiie I55A 
Monterey. CA 

Dr. C^>T>eth Boodoo 
EJucaikmal Teaiinj Servict 
Pnnceioft, NJ 0fl54J 

Dr. RicharJ U Bncch 
HO, USME?COM/MEPCT 

Green Bay Road 
Nonh Chicago. IL 60064 

Dr. Roben Brennan 
American CoUege Tc»Ung 

Program! 
P. O Box 1« 
lo^a C,t\\ lA 5220 

Dr. DaN'wj V. Budcieu 
Depanmenl of Piycbotofy 
Uni\er»iry of Haifa 
Mouni Carmcl Haifa 31999 
ISRAEL 

Dr. Gregory Candett 

Cni 'MacM.llan/McGn«^-HiJI 

Garden Road 
Monierey. CA 93*40 

Df. Paul R. Chatelier 

Perce^onict 

1911 Nonh Ft Myer Dr. 

Suite 1100 

Arlmgioa VA 22209 



Dr. Suun CWpwan 
Copkfv* Science Program 
OOkc ci Navnl Retesrcb 
ICO North Omky Sl 
Ailiogioo. VA 22217.5000 

Dr. Raymond E. Cbnauil 

UES LAMP Science Advuor 

AL/HRMIt 

BrooU AFB. TX 7t235 

Dr. Norman OifT 
Deparunem of Piycbotefy 
Univ. of So. California 
toi Angeiei. CA 90009-1061 

Diractor 

UTc Scicncet, Code 1142 
OfTtcc of hvni Rcacarcfa 
Ai1in|i<M.VA 22217.5000 

Commanding Officer 
Naval Rea'jatcb Laborauxy 
Code 4827 

Wa»hingu>«. DC 2037$.5000 

Dr. John M. ComweH 
Departmeni of Piychoto^ 
I/O Piycbotegy Progam 
TuUnc Ufirvenity 
NfW Orteant, LA XlU 

Dr. WiOiMn Ccano 
Department of Pfycbok>|x 
Tcjaa AAM Univerwty 
CoNcft $4Mioa TX 77*43 

Dr. Lioda Curran 

Dcfcme Manpower Data Center 

Suite 40O 

1«00 Wilaof) BKd 

RcjMJyn. VA 22209 

Dr. Timothy Davey 

AmetScan Colkge Teaung Program 

P.O. Box 16S 

Im Cit. 1A 52243 

Dr. Charkj E. Davia 
Educational Teeing Scrvwc 
Mail Stop 22.T 
Princ«ton.NJ 0ft54l 

Dr. Rilph J. DeAyala 
MeaMrcmnt Suiiatic*. 

Mtd Evatuatioa 
BenjMBifi Bid^ Kjo. 1230P 
Univcniiy of Maryland 
Celkgc fart. MD 20742 

Dr. Sharon Derry 
Florida State Univcnity 
Department of Piycholoa^ 
.FL 32»6 



Hei^Ki Dong 

Belkorc 

6Corponiie PI 

RM: PYA.1K:07 

P.O. Bo* 1320 

PiM»iaw>y. NJ 06a55.l320 

Dr. Ndl Donna 
Educational Teating Service 
Princdoa NJ 0ft54l 

Dr. Ftitt Draagw*^ 
UrwMTMy of illinoia 
Departfoent of Piychotogy 
603 E Daniel St 
ChaiBpAtgn, It 61S20 

Dcfenac Tecbnicai 

Information Center 
CatBeron Station, 5 
Aknndria, VA 22314 
(2Cop»«t) 



Dr. Ricbard Duran 
Graduate School of Education 
Univerai^ of California 
S»ni» Barban, CA 93106 

Dr. Sumo Hmbretion 
University of Kantaa 
Piycbolofy Departn»ent 
426 Fraier 
LMTence. KS 6«(M5 

Dr. George Engrlbard, Jr. 
DifM\on of Educational Studies 
Emory UnKwicy 
210 Fuhbume Bldg. 
AiUnta, GA 30322 

ERJC Facility. Acquiai lions 
2440 R«earch BMl, Suite 550 
RocfcvHHc. MD 20650-3236 

Dr. MarshaH J. Farr 
Farr-Sight Ca 
2520 Nonh Vcmon Strrci 
Ariington. VA 22207 

Dr. Uonard Feldt 
Undqut«i Center 

for Meaturca>ef>( 
Unfvenity of Iwmi 
Iowa Giy, lA 52242 

Dr. Richard L Fergu&on 
American College Testing 
P.O. Box I6« 
icwa City. lA 52243 

Dr. Gerhard FiKber 
Uehiggauc 5 
A 1010 Vtmn» 
AUSTRIA 

Dr. Myron Fiscl;! 

VS. Army Headquanen 

DAPE.HR 

The Pentagon 

Wa.hinjion, DC 20310-O.VO 

Mr. Paul Foley 

N4\y Personnel RID Center 

San DKgo. CA 92152-6600 

Chair. Depanment of 
Computer Science 
George Maioo Unfvtnity 
Fairfax \'A 22030 

Dr. Robert D. Gibbons 
Unrvenity of illtnoii at Chicago 
NPI 909A, MK 913 
912 South Wood Street 
Chicago. IL 60612 

Dr. Janice GiITord 
Unrvcnity of Maitjchu»etu 
School of Education 
Amherst MA 01003 

Dr. Robert Gtaser 
Learning Research 

Sc Development Center 
UnK^rsit)' of Pittsburgh 
3939 O'Hara Street 
Pittsburgh, PA 15260 

Dr. Suun R. Goldman 
Peabody College, Box 45 
Vandertilt Unfverwty 
Nashville, TN 37203 

Dr. Timothy Goldsmith 
Department of Piycholo©' 
University of New Mexico 
AlbiMjuerque, NM 87131 
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Dr. Shtmt Gott 
AFHRUMOMJ 
BrooU AFB. TX Tgyj-SOOl 

Dr. Ben Green 
Jnhfu Hf»pkini Univcniiy 
IVpanmcnt of Piyciiolofy 
Charlc* A Mih Su-ert 
a.tnmorc. MD 2I2ia 

Pr(< &Sb-ard H>cnd 
S<bool o( EdiKaiioo 
Sianford UnK^eniry 
StanfiKiCA »430S.30^ 

Dr. RoimU K. Hambkion 
Hftivcaify o( WamchuKiU 
Ubor«to<y o( PiychocMMric 
and Eviluatrvc RcMtarcb 
H>IU Souih, Rocm 152 
AmhcfTC MA OlOOJ 

Dr. Dcffc^-n Hatnisdi 
Unaerniy of Uhnoii 
Si Geny Drive 
Champ3«gn. IL *1S20 

Dr. Psirick R. H»m»oo 
Computer Science Depanment 
U.S. N»v«l AcA<J«OTy 
Annapol'*' MD 214O2.SO02 

Ml. Rebecca Meucr 

Njv^- Penonrxl RAD Ccflter 

C>jc 13 

Sj.n Drfjo. CA 92152-^00 

Dr. Tbiwna* M Hinch 
ACT 

p. O Boi 1« 

Cit>. lA S2243 

Dr. Paul W. HoUaod 
EJucaiional Te»tin| Service. 21*T 
Ro«ed3tc Road 
Pnnce«on,N3 06541 

Prof. Luu F. Homke 
Instiiui fur P^chotofic 
RWTli Aac^>«n 

D'5H« Aachen 
WEST GERMANY 

Ms. Julia S. Hoo|h 
Cambrtdje Univ<r»ity Prtia 
40 Weu 20ih Street 
Ne* Yoft. NY lOOU 

Dr. Witliam Ho»*fl 
Chief Sctcflilal 
AFHRUCA 

Brooki AFB, TX ?8235.5«1 

Dr. Huynb Huynh 
College of EJucauon 
Univ. o( Souiii Carolina 
Columbia. SC 2920t 

[>r. Manin J. Ipp«l 

Ccmer for ihe Siudy of 

EJucaiton and Instruoion 

Letden Unfveniiy 

P. O. Boi «55 

IViO RB Le.dcn 

THE NETHERLANDS 

Dr. Robert Jinnarooe 
nurc and Compoier Enj, DepC 
Un«ver«4iy of Somh Carolvna 
Columbta. SC i9M 



Dr. Kumar Jo«s-dcv 
Unfvrraity o( ItKnob 
Department of SuUiika 
101 lUini Hall 
725 South WricM Street 
ChamfMip, IL 41820 

Pro(c»Kxr DoujIm H. Jooe» 
Graduate School of Managemerh 
Ruijen, The Suie Uoivenliy 

of New Jefxy 
Ne«irt,N) 07102 

Dr. Brian Junker 
C»me|ic'Melk)0 Univeralty 
Department of StaiUiica 
Pituborgh. PA 15213 

Dr. Marcel JuH 
C«mepC'Mcllon Univenlty 
Department of Piychoiogy 
Schenky Park 
Pitiaburgh. PA 15213 

Dr. J. L Kami 
Code 442/JK 

Naval Ocean Sy«icm« Center 
San Diego. CA 92152-5000 

Dr. MkKaci Kaptan 
Onice of Basic Research 
U.S. Army Rcaearch Inaiiiute 
5O01 EiaenhoMrer Avenue 
AlcandrM.VA 22333-5«00 

Dr. Jeremy Kilpairick 
Dcpanmcnt of 

Maihemalica udiKaikm 
105 Adcrhold Hall 
Univenity at Georgia 
Aibem, CA 30602 

Ma. Hae-Rim Kim 
Untvcrsity of iHinoia 
E>cp>rtment of Suttstks 
101 mini HaH 
725 South Wrigjit St. 
CbampaigOi, IL 61820 

Dr. 3w»*keun Kim 
Depanmcnt of Psycho(o0 
Middle Tenneuee S(a:e 

Unlvenicy 
MuKreesboro, IN 37132 

Dr. Sung-Hooo Kin 
KEDl 

92-4 Urayeon-Dong 

Scocho^u 

Seoul 

SCmU KOREA 

Cr. G. Gage Kingsbury 

Portland Public School 

Reacarth and Evaluation Department 

501 North Dixon Street 

P. O. Bo« 3107 

PortUnd, OR 9720^3107 

Dr. Witliaio Koch 
Box Meas. and EvaL Ctr. 
Unjventcy o( Teaa-Auatin 
Auatio. TX 78703 

Dr. Jam Kraau 
Computer* baaed Education 

Reacarch Laboratory 
Univeraity of lllinoii 
UrhMU. IL iieoi 

Dr. Patrkfc KyHonen 
AFHRUMOEL 
Brooka AFB. TX 7SZ35 

Ma. Carolyn Larey 
1515 Spencerville Rod 
Spcnccrviilc MD 20664 



Richard Lantennan 
Commandant (G PWP) 
US Coaat Guard 
2100 Second St., SW 
Waahiogion. DC 20593-0001 

Dr. Michad Levine 
tiducaiional Paychology 
210 EducaUon Btdg, 
1310 Sooth Sixth Street 
Univcrsicy of IL at 

Urtana 'Champaign 
Champaign, IL 61820^990 

Dr. Charles Lewia 
Educations! Testing Service 
Pnnceton. NJ 06541«X)1 

Mr. Htin'hung Li 
UnKersity of IllinoJa 
Department of Sutistic* 
101 lllini Hall 
725 South Wright St. 
Champaign. IL 61620 

Library 

Nav^l Training Systems Center 
32350 Research ParWay 
Orlando. FL 32S26-32:< 

Dr. Marcia C Linn 
Graduate School 

of EJucaiioa EMST 
To! man Hall 
UnKeniiy of Califomi* 
Berkeley. CA 94720 

Dr. Robert L Linn 
Ompus Box 249 
University of Colondo 
Boulder. CO 80309.0249 

Lojicoo Inc. (Attn: Libra r)) 
Tactical and Training Systems 

Drviaion 
P.O. BOXW15S 
San Diego. CA 9213«-5l58 

Dr. Richard Luecbt 
ACT 

P. O. IM 
I(M-a Ciry, lA 522 43 

Dr. George B. Macrcady 
Department of Measurement 

Statisuca A Evaluation 
Coltege of Education 
University of Maryland 
College Part. MD 20742 

Dr. Evans Mande* 
George Mason Univenl^ 

4400 University Drive 
Faiffa*. VA 22030 

Dr. Paul Mayberry 
Center for Ni\il Ana))ii« 

4401 Ford Avenue 
P.O. Box 162M 
AkJundria. VA 22302-0:/^ 

Dr. James R. McBride 
HumRRO 

M30 Elmhursi Drive 
San Dtego. CA 92120 

Mr. Christopher McCuiker 
University of IlliDois 
Department of Paycholo©' 
603 E Daniel St. 
Champaign. IL 41820 

Dr. Robert McKinley 
Educational Testing Service 
Princeton, NJ 06541 
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BEST COPY MM 



Dr. Jottph McUcbiin 
Ka\y Penonnd Research 
and DcvdopmcM Center 

S»n DKga CA «152.««» 

Atan Mc^mJ 

c/o Dr. MKh»cl Lcvine 
EJucjttonal P«ycholo0 
210 Eiiucation Bld^ 
Un^eni;y of Illinots 
Oumpaifa IL 61901 

Dr. Timoihy Milier 
ACT 

r. o. \\oM iftS 
\m Cty. (A s::43 

Dr. Robert Mblcvy 
BducjitoruJ Tcsuni Serwict 
Princeion. NJ 06541 

Dr. Ivo Molcfwr 

Fiiculiw Socuk Wetemchjppcn 

R«]ksun^c^titei( Gronin|en 

Grotc KrutuifMi 2/1 

9712 TS Groomgen 

The NOTI-RLASDS 

Dr. £ Muraki 
Cdufa!ioo»l Telling Service 
RotcJale RoxJ 
Prioc^ioa NJ 06541 

Dr. Rkiru Nindikuroir 
Educitiom! Studies 
U'ltbrd Hill Room 213E 
Un5mity of DeHwire 
N«*rt. DE IWli 

Academic Prog*, k Research Branch 
Naval Technjcil Training Comouod 
CoJe 

N/VS Memphd (75) 
MiJImeion, TN 30854 

Dr, W. Alan Nicewandcr 
Unft^nity of Oklahoma 
Depantnent of Piycboto© 
Norman. OK 73071 

Head. Personnel Sytlcmi Departmem 

NPRDC (Code 12) 

Sjn Diego, CA 92152-6800 

Director 

Training S\-»iem« Departmenl 

NPRDC (Co4e 14) 

S^n DKgo. CA 92132.^ 

Library, NPRDC 
Code (Ml 

San DKgo. CA 92152.6fi00 
Librsrian 

NsNal Center for Applied Research 

in Artificial InielHsence 
N^al Research L«boraiOfy 
CosJ* 5510 

W»»h)ngton. DC 2a^75.5O00 

OfTjc* of NV-aJ Research, 
OvJc 1142CS 
Pf*) N Ouip.cy Stre« 
Arl.nptorv VA 22217-5000 
{6 Cop*es) 

Special Auisiant for ReKarch 

.M.^ni»flemcn^ 
Chief of Na\-al Penoonel (PERS-OUT) 
Depanmew of the Na%y 
Washmgion. DC 20350-2000 

Dr. Judiib Orasanu 

Mail S<op 2J9-1 

NASA Ame« Research Center 

Mdffctt Field. CA 94035 



Df. Peter }. Pashley 
Edoeationai Testing Service 
Roscdak Road 
Princeton. NJ («54l 

Wayne M. Patktw 
Amencao Council cm Education 
GED Testing Service, Suite 20 
One Dupoot Cirde. hfW 
Waahiogion. DC 2003« 

Depc of Aiminiitntrve Sciences 

Code 54 
Naval Pottgra^ate School 
Momeny, CA 93^3-5024 

Dr. Peter Piro« 
Scbooi of EduaiUon 
University of Cahromia 
Berteky, CA 94720 

Dr. Mark D. Reckase 

ACT 

P. O. DoK !« 
\aww City. lA 52243 

Mr. Steve Rebe 
Department of Pfyehotogy 
University of CalifofDia 
Riverside. CA 92521 

Mr. Louis Roussos 

University of iHinois 
DcpartmcM of Sutislics 
101 lUini HaU 
725 South Wright Sc 
Champaign. IL 41B20 

Dr. Donald Rubin 
Statlstia E>epartm<nt 
Science Center. Room 
I Oi^ord Street 
Har/ard Univcnity 
Cambndgc. MA 0213S 

Dr. Fumiko Safucjima 
Department of Psychology 
Unrverstty of Tennessee 
310B Austin Peiy Bhig, 
KnoxviUe. IK >79<^0900 

Dr. Maty Schrau 
4100 Partside 
Carlsbad, CA 92006 

Mr. Robert Semmes 
N218 aiiott Han 
Department of Psychoto^f 
Unrvenlty of Minnesota 
Minncapolia, MN 55455-0344 

Dr. Vakrie L Shalin 
Department of Industrial 

En gi fleering 
Sute Unrversity of New Yort 
342 Ljv^encc D. Beli Hal 
BulTak). NY 14260 

Mr. Richard J. Shavelsoo 
Graduate School of Education 
Unrvenity of CaHfomla 
Santa Barhacn, CA 9310^ 

Ms. Kathktn Shechan 
Educational Testing Service 
Princeton. NJ 06541 

Dr. Kazuo Shigemasu 
7-9-24 Kugenuma-Kaipn 
Fujisafwa 25 1 
JAPAN 

Dr. Randall Shumaker 
Nifva) Research Laboratoty 
Code 5500 

4555 Overtook Avenue, IW. 
Washingtoa DC 20375^5000 



Dr. Judy Spr^y 
ACT 

P.O. Box m 
loitM Oiy. lA 5'>243 

Dr. Martha StcKking 
Educational Testing Service 
Princeton, NJ oa54l 

Dr. WiWam Stout 
Un^eniiy of IHinois 
Department of Sutistics 
101 leini HaU 
725 South Wright St 
Champ«gn. IL 61820 

Dr. Kikumi Tauuoka 
Educational Testing Service 
Mail Stop 03-T 
Princetoa NJ 06541 

Dr. David Thissen 
Psychometric Labors tof>- 
CB# 3270, Davie Hall 
Unfvcr»it>' of North Carolina 
Chapel Hill, NC 27599-3270 

Mr. Thomas J. Thomas 
Federal Express Corporation 
Human Resource Development 
30.^5 Director Ro*. Suite 501 
Memphis. TN 38131 

Mr. Gary Thomasson 
University of Illinois 
Educationat Psycholcfir 
Champaign, IL 61820 

Dr. Howard Wsiner 
Educational Testing Service 
Pnnceton. NJ 06541 

Bizabeth Wald 

OfTKC of N*>-»l Techr.otORir 

Code 227 

800 North Quincy Street 
AfliogtoaVA 22217-5000 

Dr. Mtchael T Waller 
Univers»ry of 

Wisconsin-Mikaukee 
Educational rf)*cholo|Dr DciK. 
Box 413 

Mi»»aukee, WI 53201 

Dr. Ming-Mei Wang 
Educational Testing Service 
Mail Stop 03-T 
Princeton. NJ 06541 

Dr. Thomas A. Warrn 
FAA Academy 
RO. Box 25062 
Oklahoma City. OK 73125 

Dr. David J. Weiss 
N660 aiiott Hall 
Univtrsity of Minnesou 
75 E. River Road 
Minneapolis. MN 55455-OMt 

Dr. Douglas Wcuel 
Code 15 

Navy Personnel R&D Center 
San Diego. CA 92152 (^rt 

German Military 
Reprcsenialr\e 
Personslst^mmamt 
Koelner Str. 262 
D-5000 Koein 90 
WEST GERMANY 
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Or. Dax«l Wiley 
School of F.Jiicauort 

Sonh*c»tcm UnKcnity 

Or Hruic Willi jm« 
rX-junmcni of EJucaiiooal 

(.tnnmify (>f IllmoU 
Urh.ina. IL <»ISOI 

IV M.irk Wilfa>n 
S»-h»'*il of rAlocdtion 
Uni\mii\- of CaJifomii 
IWricky CA *47:o 

I>r. Tuficnc Wnogrid 
Dvp^"""'0' Pj>vho4ogr 
f-nnH^- Unj*crwfy 
AiL.nu, GA mZl 

I>f. Nijnin F. Wijkoff 
i'KKSI-RUC 

»*J {\Kffic Sl, Surte 4556 
Momcrc>-, CA 93<M0 

Mr. John H. Wdfc 

N.rtv Pcnonncf R&D Ornicf 

Sjn'Dicgo. CA '>:152-68(jO 

'Jr Kcnuro Yairumoio 
(ivOT 

r>JuvaiK>nil Telling Service 
Ro*«(t>k Ro>d 
rnnccioa SJ 08541 

NU. Du.mU Yan 

Prifucton. NJ («541 

Dr. WenJy Yen 
CTDM^-Gra* Hitt 
IXi Monic Rc*<»rch Part 
M.WKTcy, CA 9}9Mi 

Dr. J*>f^h L Yoonj 
SjiKMial Sc»<occ FourxJuion 
Rncm y.O 
:n«i C Sirt«c N.W. 
Vijkhinf;i.>n. DC 20550 



